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Abstract: The digitalized patient-centric system, the Electronic Health Record (EHR), is a 
platform where comprehensive health information is stored, managed, and accessed 
electronically. The primary findings of this study aim to secure sensitive patient data and 
increase overall system resilience by demonstrating that machine learning can evaluate 
vulnerabilities and improve the security of Electronic Health Record (EHR) systems. This 
research examines the prospects of incorporating machine learning-driven assessment 
tools and safety improvements in EHRs to enhance data protection in the healthcare 
industry. The proposed method utilizes the implementation of machine learning 
classifiers, specifically the XGBoost and LightGBM models. These classifiers are 
employed to enhance various aspects of the system, such as data protection and security, 
within the framework of EHRs. The study emphasizes the efficiency of these machine 
learning classifiers in ensuring that EHR systems are secure enough to deal with any 
problem that may occur due to threats posed by external factors or hackers. The findings 
reveal that the XGBoost model always has outstanding performance, with a near-perfect 
Receiver Operating Characteristic Curve (ROC) having an AUC equal to 1.00, indicating 
close to perfect accuracy in distinguishing positive from negative cases. Similarly, 
LightGBM has a perfect ROC curve as well. Therefore, its performance would be 
considered flawless. Consequently, future developments could lead to sophisticated 
machine learning models besides those that have already been developed. Improving data 
storage through encryption and building safer communication protocols should also be 
considered to make these systems withstand new security problems. Thus, this study 
contributes to the existing literature on applying technology to safeguard vulnerable 
medical records while fostering a safe and efficient healthcare ecosystem. 


Introduction 

There is growing interest in using data from electronic 
health records (EHRs) for patient registries. This study 
aimed to examine how EHR interoperability impacts 
patient safety and other dimensions of care quality in 
high-income healthcare settings. EHRs are electronic 
systems used and maintained by healthcare organizations 
to collect and store patients’ medical information. The 
study concluded that patient registries are patient- 
to derive 


centered, purpose-driven, and designed 


These databases store a patient’s medical history, 
diagnosis, prescription, immunization dates, allergies, 
radiographs, and test results. EHRs improve patient 
treatment coordination and medical professional 
communication. Healthcare administrators can make 
evidence-based choices, reduce medical errors, and speed 
up administrative operations with electronic health 
information. By increasing interoperability and data 
EHRs patient outcomes and 


healthcare efficiency, which promotes medical research, 


exchange, improve 


information on specific exposures and health outcomes. public health, and healthcare policy. EHRs, also referred 
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to as Electronic Medical Records (EMRs), are electronic 
counterparts of traditional paper health records created, 
managed, and preserved by care providers (Mayer et al., 
2020). These records are exclusively accessible to patient 
caregivers. Personal health records (PHRs) enable 
patients to manage and update their medical histories. 
EHRs are protected by the Health Insurance Portability 
and Accountability Act (HIPAA), not personal records 
(Himabindu et al., 2024). Patients' electronic health 
records document several healthcare practitioner visits in 
diverse places. Figure 1 depicts the EHR System. 


have gained prominence for overcoming difficulties 
Natural Language 
Processing (NLP) strategies for extracting data (Locke et 
al., 2021; Osmani et al., 2018). 

The ability to autonomously learn representations 


encountered by conventional 


from data is a significant factor contributing to the 
increasing popularity of deep learning (DL) models (Xiao 
2018). 
functions grows alongside the scale of the dataset (Esteva 
et al., 2019). Clinical applications of ML/DL models hold 
considerable promise for revolutionizing the healthcare 


et al., Their resilience to high-complexity 
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Figure 1. Introduction to the EHR System (Kumar et al., 2029). 


EHR includes demographics, concerns, prescriptions, 
vital signs, medical history, immunizations, laboratory 
data, and imaging results. EHRs streamline clinical 
processes. Quality management, outcome reporting, and 
evidence-based decision support, which can capture a 
patient's clinical experience, are among the capabilities of 
an EHR. Medical professionals, hospitals, and other 
healthcare facilities utilize EHRs as the current standard. 
Healthcare facilities preserve patients' medical history, 
but they might be hard to access. Some hospitals provide 
patients with physical or electronic copies of their 
information, whereas others with more advanced systems 
provide secure online access (Lee et al., 2021). The 
healthcare sector is rapidly adopting digital technologies 
such as artificial intelligence (AI), machine learning, big 
data analytics, smart sensors, the Internet of Things (IoT), 
and robotics to improve the efficiency and quality of care 
provided. Advanced countries are leading this trend (Lee 


et al., 2019). Recently, Deep Learning (DL) techniques 
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industry. However, various privacy and security issues 
must be resolved before these techniques can be reliably 
applied in healthcare systems (Qayyum et al., 2020). ML 
can potentially exacerbate pre-existing health inequities, 
posing several ethical problems (Chen et al., 2021). 
Bioethics concepts can be used to create morally sound 
ML models in the healthcare industry (Keerthana et al., 
2024; Vayena et al., 2018). Security threats like as data 
confidentiality, privacy, and integrity assaults are a major 
concern because of its ability to generate huge amounts 
of data at predictable intervals (Kumari et al., 2018). 
Strict access limits and other security measures protect 
patient data. Covered entities need internal operations 
Protected Health Information (PHI) is used 
worldwide and supplied digitally. 
entails the preservation of confidential documents. 


controls. 
Healthcare privacy 


Regulations and policies play a pivotal role in 
accomplishing this goal. Patients have the right to know 
who opens and uses their medical records. Patients' health 
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information is protected by HIPAA (Hathaliya et al., 
2020). 

This research signifies noteworthy progress in data 
security by incorporating multiple state-of-the-art 
the 


encryption of Elliptical Curve Cryptography (ECC) with 


characteristics. Combining secure asymmetric 


the powerful symmetric encryption of Advanced 
Encryption Standard (AES) creates a_ triple-layered 
defense mechanism with the Hyperledger blockchain. 
Besides, this research employs enhanced machine 
learning classifiers which are XG Boost and Light GBM, 
with 
enhances 


two widely accepted models outstanding 
Additionally, it their 
predictability by employing an ensemble of two 
classifiers, XGBoost and LightGBM, rather than relying 


on a single classifier, thereby yielding a more robust and 


performances. 


accurate outcome. Furthermore, both depend on 
blockchain for the same reason that they have developed 
encryption processes as well as ML. The encrypted EHR 
data is moved to a blockchain where the integrity and 
the data This 


demonstrates that the increased security measures do not 


immutability of are guaranteed. 
violate blockchain technology's decentralized or secure 
nature. 

Moreover, a strong validity condition implies that 
sharing should occur between authorized entities only. 
Additionally, enhanced security is achieved through 
advanced encryption techniques, ensuring the secure 
transmission of data and safeguarding information 
effectively. Ultimately, this investigation conforms to 
contemporary security protocols and hence adheres to 
The 


utilization of AES as a symmetrical encryption standard 


internationally acceptable encryption standards. 


and ECC as a prevailing asymmetrical encryption method 
supports current security norms. In comparison to simple 
methodologies employed earlier, this creates a more 
robust secure framework for handling patient data. 


Related Work 

A review of the literature analyzing the relevant work 
by different authors. 

Huang et al. (2023) employed transparent ML 
methods to form a top-down arrangement of major 
predictors using model importance statistics like gain, 
cover, and frequency. The results revealed an average age 
of 74.05 with a standard deviation of 12.85. The AUROC 
(Area under the Receiver Operating Characteristic Curve) 
for the XGBoost model was 0.662. The SHAP 
explanations whose total values were greatest included 
urine output, leukocytes, bicarbonate, and platelets. 
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Yang et al. (2023) developed an application of ML for 
predicting acute respiratory distress syndrome (ARDS) in 
Intensive Care Unit (ICU) patients by creating a new 
model and validating it. The AUC values of the 
respective models were as follows: Logistic Regression 
(LR) was 0.664, K-Nearest Neighbour (KNN) was 0.692, 
support vector machine (SVM) was 0.567, Decision 
Trees Classifier (DTC) was 0.709, Random Forest (RF) 
was 0.732, XGBoost was 0.793, LightGB was 0.793, and 
CatBoost was 0.817. 

Shah et al. (2023) presented a new approach to 
improving network security and analyzing data derived 
from Personal Health Records (PHR). When they 
analyzed data from individual health records, they 
employed neural networks with variational Boltzmann 
spatial encoder capabilities. They achieved a more secure 

the decentralized blockchain 
experimental investigation was 


network by 
architecture. 


using 
The 
conducted using data and network security. It measured 
random accuracy at 81%, specificity at 55%, latency at 
62%, quality of service at 52%, and computational cost at 
41%. 

Alam et al. (2023) provided an application called 
FedSepsis for early sepsis detection leveraging EHRs. 
Several (Deep Learning) DL methods were utilized for 
the prediction and NLP _ jobs. 
satisfactory, and when devices were moderately 
numerous, the outcomes in the federated learning 


Performance was 


configuration were comparable to those in the single 
server-centric configuration. The most optimal approach 
was to use multimodality in conjunction with generative 
adversarial neural networks. The outcomes were a near- 
perfect accuracy rate of 96.55%, a receiver operating 
characteristic area of 99.35%, and a latency of 4.56 
hours. 

Corbin et al. (2022) explored the potential benefits of 
clinical decision support based on machine learning in the 
context of antibiotic prescribing management. A 
retrospective multi-site study was conducted, which 
trained ML models to anticipate antibiotic susceptibility 
patterns, also referred to as personalized antibiograms, 
using EHR data about 8342 infections at Stanford's 
emergency departments and 15,806 cases of 
uncomplicated UTIs at Boston's Massachusetts General 
Hospital and Brigham & Women's Hospital. Based on 
data from Stanford, clinicians were able to reallocate 
antibiotic selections with the help of tailored 
antibiograms, resulting in a coverage rate of 85.9%. This 
rate was comparable to clinician performance, which had 
been determined to be 84.3% (p = 0.11). The tailored 
antibiogram coverage percentage in the Boston dataset 
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was 90.4%, which was much better than the doctors’ rate 
of 88.1% (p < 0.0001). 

Tsiklidis et al. (2022) developed a model that could 
predict continuously the likelihood of patient death or a 
risk metric. The AUROC measured by the model was a 
measure of its accuracy. The author obtained an accuracy 
level for this model of 92.9%. 

Pang et al. (2021) proposed seven machine learning 
models that utilized the EHR data from up to 2 years ago 
to predict the chances of children aged between 2 and 7 
being obese. There were seven models, and _ their 
comparison was done using post-hoc pairwise testing as 
well as Cochran's Q test, while performance was 
evaluated using different standard classifier metrics. 
XGBoost outperformed all other models with an AUC of 
0.81 (0.001). Besides, it performed better than the other 
models on traditional classifier metrics: accuracy 66.14% 
(0.41%), specificity 63.27% (0.41%), precision 30.90% 
(0.22%), and Fl-score 44.60% (0.26%). 

Hou et al. (2020) utilized XGboost to construct an ML 
model for predicting 30-day mortality in sepsis-3 patients 
admitted to the MIMIC-III database and determined if it 
outperformed traditional prediction models. According to 
the AUCs’ results (0.819 [95% CI 0.800—0.838], 0.797 
[95% CI 0.781—0.813] and 0.857 [95% CI 0.839—0.876]) 
and decision curve analysis of the three models, the 
XGboost model exhibited the best overall performance 
among the others. This was validated by the risk 
nomogram and clinical impact curve, where the XGboost 
model demonstrated good predictive value. 

Souri et al. (2020) recommended an IoT-supported 
student health monitoring system where smart medical 
gadgets were used to trace the vital signs of students 
discreetly and any changes in their biology or behavior. 
The concept was to identify probable dangers connected 
with shifts in the way students behaved and what they did 
to their bodies by gathering important information from 
IoT gadgets and processing it with the help of machine 
learning mechanisms. The results obtained during the 
experiment confirmed that there was effective 
functioning and precision of this model concerning 
student health evaluations. After testing the proposed 
model, the SVM achieved the highest accuracy of 99.1%, 
which was encouraging for the aim. The outcomes were 
superior to those of algorithms based on decision trees, 
random forests, and multilayer perceptron in neural 
networks. 

Vos et al. (2020) examined that EHRs could enhance 
collaboration among healthcare professionals, but their 
impact on teamwork remained clueless. When five 
Dutch hospital with a 


outpatient clinics in a 
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comprehensive EHR system were examined, the research 
found mixed results. Although the system facilitated real- 
hindered 
interdisciplinary collaboration due to asynchronous 
access to patient records. While it streamlined certain 
tasks and facilitated data-based decision-making, 
specialized interfaces impeded data comprehension. 
Additionally, 
efficiency, it also imposed rigid authorization 
requirements and increased administrative burdens on 


time coordination across specialties, it 


while it improved documentation 


physicians, limiting flexibility. 

Hirano et al. (2020) proposed an open-source, publicly 
and CNN-based COVID-Net 
vulnerability was examined. This model was among the 
first deep learning models to detect COVID-19 using 
chest X-ray (CXR) images. Two kinds of attacks— 
targeted and nontargeted—were investigated using 
perturbation created by the fast gradient sign technique 
(FGSM). The authors evaluated both the COVID-Net 
CXR small and CXR big models. Their results showed 
that both models had been able to attain success rates of 
>85% for non-targeted attacks and >90% for targeted 
attacks after adding 2% universal adversarial 


available, model's 


perturbations. 

Mandair et al. (2020) investigated the development of 
a machine-learning model aimed at predicting the 
incidence of myocardial infarction (MI) within six 
months, utilizing harmonized electronic health record 
(EHR) data. The findings demonstrated that, compared to 
alternative models, a combination of random under- 
sampling with deep neural network (DNN) classification 
proved more effective. There were 2,531 patients with MI 
diagnosed in this study, while there were 2.25 million 
without MI diagnosis. The classification accuracy of a 
deep neural network trained with random under-sampling 
was much higher compared to other approaches. The 
moderate benefits of the deep neural network became 
apparent when compared to logistic regression using only 
known risk factors, namely, Fl Score is 0.092, and AUC 
is 0.835. 

Newaz et al. (2019) proposed a new security 
framework called HealthGuard based on machine 
learning to identify malicious activities in Smart 
Healthcare System (SHS). The results showed that 
HealthGuard was an effective security framework for 
SHS, with an accuracy of 91% and an F1 score of 90%. 

Bhattacharya et al. (2019) presented a framework 
called Blockchain-Based Deep Learning as a Service 
(BinDaaS). The integrated blockchain and DL methods 
for multiple several 


sharing EHR records among 


healthcare users were carried out in two phases. Different 
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parameters such as accuracy, end-to-end latency, mining 
time, computation, and communication costs were used 
to compare the obtained results with those of existing 
state-of-the-art proposals. Based on the results obtained, 
BinDaaS surpassed all other systems. 


Problem Statement 

The major problem is providing strong security and 
authenticity to electronic health record (EHR) systems, 
considering the changing nature of cyber threats. The 
main concerns are preventing unauthorized access, 
breaches, or tampering with data that may lead to 
disclosure of patients’ privacy and _ medical 
confidentiality. Furthermore, the complicated healthcare 
environments consisting of several stakeholders and 
interrelated systems make it difficult to ensure secure 
data exchange and compatibility. This investigation 
involves assessing factors such as key verification, clarity 
in representation, and ensuring the system meets modern 
encryption standards in healthcare data security. 


Dataset Description 

A widely used medical dataset, Medical Information 
Mart for Intensive Care HJ (MIMIC-IID, is available on 
Kaggle. The MIMIC-III dataset is massive, anonymous, 
and publicly available. Each entry in the dataset is 
accompanied by an ICD-9 code, documenting the 
diagnoses and procedures performed. These codes are 
further subdivided 
indicating specific circumstances surrounding them. The 


into sub-codes, in most cases 
data set is comprised of 112,000 clinical reports with an 
average length of 709.3 tokens and 1,159 top-level ICD-9 
codes. On average, each report has been assigned to 7.6 
codes. These data contain vital signs, prescriptions, 
laboratory measures, observations and notes recorded by 
healthcare professionals; fluid balance, procedure codes, 
diagnostic codes, imaging reports; hospital length of stay 
survival data; and additional patient information. This 
database supports applications like academic research and 


development or monitoring healthcare services, 


Research Methodology 

An analysis of the designed architecture is conducted 
within the framework of the research technique. 
Technique Used 

Various techniques used in the proposed method are 
the Hash-based Message Authentication Code (HMAC) 
Algorithm, AES algorithm with ECC for encryption, 
Blockchain, Cloud Computing, and ML Classifiers. 

HMAC Algorithm: HMAC is a popular technique 
used in many different types of EHR systems and other 
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areas of cybersecurity (Vignesh et al., 2017). When 
sending messages or data across different parts of an 
EHR system, HMAC is utilized to ensure that nothing has 
been tampered with along the way. EHR HMAC 
techniques improve patient data security. HMAC is used 
to fingerprint sensitive patient data in EHR systems to 
ensure data integrity. Intentional or not, changes create a 
new hash value that flags the file as compromised. 
Finally, HMAC can authenticate data in transit. An 
HMAC produced with the sender's private key can verify 
EHR data transfers. Recalculating and comparing the 
HMAC at the receiving end prevent unauthorized access 
to data in transit (Gabriel et al., 2021). Timestamps 
confirm data currency and replay attacks are prevented 
with HMAC. HMAC secures accounts and transmits 
data. When HMACs are produced using credentials, the 
system securely authenticates users. 

AES Algorithm: EHR systems must implement the 
AES algorithm to protect patients' personal information. 
Data in healthcare applications like EHR systems can 
benefit from AES's high degree of security and efficiency 
because it is a frequently used symmetric encryption 
technique. The audit trails are unchangeable, and all 
parties are accountable when these logs are encrypted 
(McGhin et al., 2019). HIPAA and other healthcare 
standards require EHR systems to encrypt patient data 
and corporate procedures to protect liability. 

ECC: ECC is a robust encryption method that can be 
used to set up safe lines of communication within EHR 
systems. It is computationally efficient and well-suited 
for resource-constrained contexts like those found in 
healthcare devices because of its robust security and 


reduced key lengths. 
Blockchain: Blockchain technology could 
revolutionize healthcare EHR = systems. Blockchain 


technology in EHRs has many benefits. Its distributed, 
unalterable ledger improves data security and integrity. 
Data breaches, hacking, and tampering with patient 
records are greatly reduced by blockchain technology. 
Each patient data transaction is saved as a block and 
linked to the one preceding it. Blockchain technology 
also solves healthcare data exchange and interoperability 
issues (Huang et al., 2019). The network maintains data 
accuracy and consistency by making patient data transfer 
between healthcare providers secure and fast. Smart 
contracts improve interoperability and permit automatic 
data transfer under certain conditions. Blockchain also 
lets patients manage their health records. Patients can 
grant and revoke cryptographic keys to restrict data 
access to those who need it. This open strategy protects 


patient privacy and consent (Mishra et al., 2023). Figure 
2 depicts the working of blockchain technology. 

Clo in EHR 
systems is altering healthcare by making patient health 
information management, storage, and access flexible and 
efficient. The cloud-based EHR technology benefits 
healthcare workers and changes patient data storage and 


iting: Cloud computing 


access. They allow hospitals to scale their data storage 
and processing capacities without investing in additional 
facilities, which is a huge benefit. The ability to view 


patient records from anywhere with an_ internet 
connection improves medical staff mobility and 
accessibility, leading to faster and better medical 


decisions. Healthcare providers can improve patient care 
and quality by eliminating the need for pricey on- 
premises gear and software. Moreover, cloud-based EHR 
solutions must prioritize data security and compliance 
(Chenthara et al., 2019). 


Int. J. Exp. Res. Rev., Vol. 43: 160-175 (2024) 


i) XG Boost: XG Boost (Extreme Gradient Boosting) 
is a machine learning classifier that is known for its 
efficiency and effectiveness. EHR systems have seen 
XGBoost shine in predictive modeling tasks such as 
disease diagnosis and risk prediction (Romeo et al., 
2020). It can also fill gaps, deal with convoluted 
interactions between data points, and measure how 
important each feature is in healthcare analytics. The 
accuracy of illness forecasting could increase, high-risk 
patients could be identified, and healthcare practitioners 
could maximize limited resources using XGBoost. 

ii) Light GBM: Light GBM is another gradient- 
boosting technique that can handle many features and a 
huge volume of datasets. Light GBM has a range of uses 
or applications in EHR systems, including prognosis 
modeling, drug response modeling, and outlier detection. 
It is well suited for real-time health data applications as it 
trains very quickly and can be applied to big datasets. 
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Figure 2. Blockchain Technology (Ahmadi and Aslani, 2018). 


; Electronic Health Record machine 
learning classifiers play a crucial role in_ illness 
prediction, patient risk stratification, and treatment 
recommendation. EHR utilizes machine learning 
algorithms, such as neural networks, decision trees, and 
support vector machines (SVM), to manage extensive 
patient data. Those who are engaged in the health sector 
would finally be able to tell what is likely to happen. ML 
classifiers might result in a high-quality healthcare 
service that everyone can afford because they detect 
diseases before they become severe, predict hospital re- 
admissions, and optimize treatment options (Hasan et al., 
2029). The machine learning classifiers mentioned below 
are XGBoost and LightGBM. 
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LightGBM assists healthcare organizations in improving 
their forecasting capabilities, leading to faster preventive 
actions and personalized treatment. Likewise, it finds 
deviations or anomalies in the patient information that 
could help detect health conditions early, thereby 
enhancing patient safety (Chami et al., 2019). 
Proposed Methodology 

Figure 3 demonstrates the proposed method in a 
diagrammatic form and outlines a system for storing and 
retrieving electronic health records (EHRs) using 
blockchain, HMAC authentication, encryption, 
machine learning (ML) classification. 


and 
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Figure 3. Proposed Methodology. 
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Proposed Algorithm 


The author uses mathematical notations to symbolize some individual steps: 
Step 1: Registration of Patient: 

Assume P represents the patient set, where P consists of elements p1 to pn. 
On registering a new patient, add to the set P: 


P=PU {(p.tl} 


Step 2: HMAC authentication: 
Generating the authentication code involves utilizing the HMAC function in the following way: 


HMAC(K,M) = H((K © Opqa) ll H(CK © ipaa) Il M)) 


Where, @ signifies the bitwise XOR operation, K denotes the secret key, M represents the message for 
authentication, t,qq signifies the inner padding (repeated byte, typically 0x36), 0,qq represents the outer 
padding (repeated byte, typically Ox5C), |Ill denotes concatenation, and H stands for a cryptographic hash 
function (e.g., SHA-256, SHA-512). 

The authentication code can be calculated as follows: 


authentication_code = H (key, patient_info) 

Step 3: Encryption using AES + ECC: 

Encrypting the message using AES: Cyags = AES_Encrypt(M, Kags) 
Encrypting the AES key: Cgc¢ = ECC_Encrypt (Kags, Kpoe 
Combining the ciphertexts: Final_Ciphertext = (Cars, Cecc) 
Decrypting the AES key: Kags = ECC_Decrypt(Crcc, Keer) 
Decrypting the message: M = AES_Decrypt(Cags, Kags) 


Where, M represents the plaintext message and Kygs is the AES symmetric key. 


Let K, pa and K; a denote the ECC public and private keys respectively, while Cygs and Cgcc denote the 
AES and ECC ciphertexts respectively. 


Assume E (m, k) be the AES + ECC encryption function, where 'm' as the message and 'k' as the public 
key. 


Encrypted EHR data can be shown as: 
encrypted_data = E (EHR_data, public_key) 


Step 4: Blockchain upload: 

Let B denote the Blockchain, and T represent the transaction containing the encrypted EHR data along 
with the hashes of the previous and present blocks. Let's denote the input data as D, the current state of the 
blockchain as S, and the resulting updated state as S'. 

The formula for updating the blockchain state is given by: 


S’ = Hash(Encrypt(D) + Hash(S)) 
The process of transferring encrypted EHR data into a Blockchain platform can be illustrated as follows: 
T = {encrypted_data, previous_block_hash, current_block_hash} 
B=BU {T} 
Step 5: Verification of the key condition: 


Let K denote the pre-shared secret key and received_key is the key received from the other medical center 
requesting for data. 
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Key verification condition can be expressed as: 
IF (received_key == K) 

THEN receive_data () 

ELSE end_process () 


Step 6: ML classification using XG Boost and Light GBM: 

Let denote the output labels X represent the input data, Y represent the input data, fy, denote the XG 
Boost classifier function, and f;¢pgy denote the Light GBM classifier function. 

The classification process using ML classifiers (XG Boost and Light GBM) is demonstrated as follows: 


For XG Boost: 


Ynredicted, XG = fx (Xtrain) 


n K 
(XGBoost) =) Lowy") + > Oe) 
i=1 k=1 


For Light GBM: 


Ypredicted, topo = fice (Xtrain) 


n K 
(LightGBM) = ) Lowy") + > 2 (Fi) 
i=1 k=1 


Where, represents the number of training samples. 

L (y;, y;*) denotes the loss function measuring the difference between the true label y; and the predicted 
label y;*. K is the number of trees in the ensemble. Q(f;,) signifies the regularization term penalizing the 
complexity of each tree. 

Subsequently, accuracy is calculated using the predicted values as follows: 


_ TP xGBoostt+ TNxGBoost 
Accuracyx¢ Boost — 


TPxGBoost = we ll (xg Boost lé] = 1 and Ytrueli] = 1) 
TNxcBoost = it I Oxe Boost [i 0 and Yrrueli] = = 0) 
[ 


]= 
FPxGBoost = vei I Ox Boost i] = land true i]= 0) 
] 


_ TV _ — 
FNy@goost = di=1 I xe Boost li = O0and Ytrue [i] _ 1) 
Similarly, 
A - TPLightgBMt TNLight GBM 
CCUraCcyLight GBM = 


TP ight GBM+ FPLight GBM+TN ight GBM+ FN ight GBM 
TPright GBM = =e 1 Il (Ytignt cemlt] = 1 and Yeryeli] = =1) 
TNiignt cpm = Lies |I (Ytignt cpm li] = Oand Yrryeli] = 0) 
FPrignt GBM = =o 1 ll (Yuight cemlt] = 1 and Yeryeli] = = 0) 


FNyight cam = Det | (Yeignt cami] = 0 and yprueli] = 1) 


accuracyxg = accuracy_score (Ytest, Ypredicted, xa) 


accuracy,gpm = accuracy_score (Vest, Ypredicted, xa) 
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Result and Discussion 

The efficacy of a proposed ML-driven security 
architecture for EHR applications was assessed in the 
study. This is achieved through a better encryption 
technique by combining multiple encryption methods and 
advanced machine learning classifiers, which identify and 
prevent possible security breaches within such vital 
healthcare databases. These models showed strong 
capabilities in detecting patterns as well as risks that 
could be present, making EHR systems more protected 
from cyber threats. This section discusses how the ML- 
driven security framework detects and prevents potential 
scams on electronic health record systems. 


Input Add New 
Patient Record 


Figure 4 illustrates the blockchain-based method of 
generating blocks. It starts with a blank new patient 
registration form on a website that has fields for 
username, password, and confirm password, followed by 
a signup button. Then, the EHR system’s Add New 
Patient screen is used to capture important demographic 
and clinical information about patients in a set P, referred 
to as pi to py. These are such details that are recorded on 
such screens as row_id, subject_id, hadm_id, admitting, 
dischtime, admission_type, admission_location, 
discharge_location, insurance details, language preferred, 


Login 


Username 
Password 


Add New Patient 
row_id language 
subject_id religion 
hadm_id martial_status 
admittime ethnicity 
dichtime edregtime 
deathtime edoulttime 
admission_type diagnosis 


admission_location 
discharge_location 
insurance 


hospital_expire_flag 
has_chartevents_data 


Output 
Patient Record 


. “row_id”: “1” 
. “subject_id”: “1001” 
. “hadm_id”: “142345” 
. “admittime”: “23-10-2164 21:09” 
5. “dischtime”: “1/11/2164 5:15:00 PM” 
. “deathtime”: “NA” 
. “admission_type”: “EMERGENCY” 
. “admission_location”: “EMERGENCY ROOM 


13. “marital_status”: “MARRIED” 
14. “ethnicity”: “WHITE” 

15. “edregtime”: “23-10-2164 16:43” 
16. “edouttime”: “23-10-2164 23:00” 
17, “dignosis”: “SEPSIS” 

18. “hospital_expire_flag”: “0” 


19. “has_chartevents_data”: “1” 

20. “prev_block”: { 
21“hash”:“13294bbb241ad95b1964ce44 
31688a3c ”; filename”: “40”} 


ADMIT” 

9. “discharge_location”; “HOME HEALTH CARE” 
10. “insurance”: “Private” 

11, “language”: “English” 

12“religion”: “NOT SPECIFIED” 


Figure 4: Block Chain-based block generation process 


Encryption public key: Oxd373abfc29F39e0ed961ebff0a3F92880a83e9 3 f7efb964e668eOdaSecBFf2a11312296F adedbeb264ad07b71dc5522795ce7b34372f 696de7c 
e73fd39ed67 

Decryption private key: 0x260a765c0476cOaf3508018b72c4176be04152b96dFad0b431a462131434bb6f 

Plaintext size: 266 

Encryption time: 0.011214733123779297 

Encrypted size: 363 

Encrypted: b'04a4155869d4dc37F324c59339bf616e024ealealefcact91d63a9ce727a033707 2bec3f3e70f125770F 27af bbea7f1a7b438a7c14acb2d0ec6216e903177565b 
216d136bb963292214cc94f666deb32ad67b04a28e9380cc511d6928e17e45f7Fdc876acSedabe232Fa91F2e7b306a1a5184358d3e3eBf5922b719¢c4212F 23ee3ba4f3299Fc 
655b33F89c8F5982d37236a95500b0c780e8b04799a39 7 36c6899804724655992092874b0F230b6e04cC6308907 6deS62b86cc2azea83e75f43bf9a6Iabce4O684ef 3523f e555 
3e3ce25afded6cbd2b2711183580c0545e5d9499af41b9c2acae7956437066199637Fd967e7737a0d5b8013a0ed15530df88132017279594697Safc3bf836c3975e9b37d07c32a 
33a93ab2F66120144b6d68215820cfedf1e387b2372b89776d74cF1Fa254bc1ec4677cdeae7e6692ef a0c20386c02c010db358d3b80d7 76f abaed3d1440850c753bb450d69b557 
£45c2089d5F769F54F3ba4a86adac' 

Enter Private key : 6x260a765c0476cOaf3508018b72c4176be04152b96dfad0b431a462131434bb6f 

Decryption time: 0.0015707015991210938 


Figure 4. Data encryption and decryption process. 
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religion, marital_status, ethnicity, edregtime, downtime, 
diagnosis, and hospital_expire_flag. The last figure 
shows a code from a medical database. The above-given 
code contains patient identifiers like row ID, subject ID, 
and admit and discharge time. This also contains other 
details about the patient, including his or her being 
admitted type, location, insurance, language, religion, 
marital status, and ethnicity. Further, this code has dates 
reflecting the time of registration of the patient at the 
hospital, which later led to their cancelation from 
hospitalization and occurrence of death since the patient 
was not discharged alive. At last, the code has a hash and 
filename. 


0.0200 
0.0175 
0.0150 
0.0125 


0.0100 


Decryption time 


0.0075 


0.0050 


0.0025 


environment. The results are shown in the figure below, 
such as data encryption and decryption process, 
encryption time vs. decryption time, and plain text vs. 
encrypted text. 

The encryption and decryption process of data is 
illustrated in Figure 5. It represents a piece of text as 
encrypted and decrypted. An observable public key, 
private key, plaintext size, encryption time, encrypted 
size, and encrypted text are present. Thereafter, the user 
is prompted to provide the private key required to decrypt 
the text. 

Figure 6 shows the number of times a piece of data is 
encrypted or decrypted. The x-axis displays the time 


0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 0.020C 
Encryption time 


Figure 5. Encryption time vs Decryption time. 
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Figure 6. Plaintext size vs Encrypted size. 


These findings highlight the capability of Machine 
Learning classifiers to enhance the security and privacy 
of EHRs, hence aiding in safeguarding sensitive patient 
data within the ever-evolving healthcare technology 
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taken in seconds to encrypt, while the y-axis indicates the 
number of times the data is decrypted in bytes. The graph 
shows that the data is encrypted more times than it is 
decrypted. This is because encryption is a one-way 
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process, while decryption is a two-way process. The 
graph also shows that the number of times the data is 
encrypted or decrypted increases as the encryption time 
increases. This is because more complex encryption 
algorithms take longer to run than less complex 
encryption algorithms. Figure 7 (as shown in the graph) 
shows the size of plaintext and encrypted size. The y-axis 
is labeled encrypted and ranges from 350 to 400, and the 
other is labeled plaintext size, with a range starting at 270 
and increasing to 300. These findings highlight the 
capacity of machine learning classifiers to enhance the 
security and privacy of EHRs, thereby supporting the 
continuous endeavors to safeguard sensitive patient data 
in the ever-changing field of healthcare technology. The 
results are shown in the confusion matrix below for the 
XGBoost and LightGBM models. 


True Labels 


Figure 8 displays a confusion matrix for an XGBoost 
model, illustrating the model's classification performance. 
Rows indicate predicted labels, columns represent true 
labels, and each cell shows the instances where 
predictions differed from actual labels. Figure 9, 
portraying a LightGBM model's confusion matrix, 
reveals strong performance with most labels aligning on 
the diagonal, affirming the model's suitability for the 
task. Table 1 presents a comparative analysis of previous 
methodologies alongside the proposed approach, utilizing 
the MIMIC III dataset. The results indicate that Huang et 
al. (2023) achieved a 66.2% accuracy by employing the 
XG Boost technique. Yang et al. (2023) showcased 
varying accuracies, with DTC at 70.9%, RF at 73.2%, 
XG Boost at 79.35%, Light GBM at 79.3%, and Cat 
Boost at 81.7%. Tsiklidis et al. (2022) demonstrated 


25 


1 


Predicted Labels 


Figure 7. Confusion matrix of XG Boost model. 
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Figure 8. Light GBM model’s confusion matrix. 
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accuracies of 74% for SVM, 78.7% for LR, 87.2% for 
Gaussian Naive Bayes, and 92.9% for GB Classifier. Hou 
et al. (2020) attained an 81.9% accuracy with the XG 
Boost technique. Notably, the proposed method achieved 
a remarkable 100% accuracy using both XGBoost and 


LightGBM models. 
Table 1. Comparative Analysis of Related 
Techniques. 
Authors Techniques Values 
Huang et al. 
(2023) XGBoost 66.2% 
DTC, Random 
Yang et al. Forest, XGBoost, 102 
(2023) LightGBM ee 
° 79.3%, 81.7% 
CatBoost 
SVM, LR, 
Tsiklidis et al. Gaussian Naive 74%, 78.7%, 
(2022) Bayes, GB 87.2%, 92.9% 
Classifier 
Hou et al. 
(2020) XGBoost 81.9% 
Proposed XG Boost, Light 
Method GBM ee eid 


Figure 10 and Figure 11 shows the ROC curves of 
XGBoost and LightGBM. The performance of both the 
XGBoost and LightGBM models is exceptional, as 
indicated by their ROC curves. 

The XGBoost model has a nearly perfect ROC curve 
with an AUC (Area Under the Curve) of 1.00 indicating 
its excellent ability to correctly classify positive and 
negative cases. Likewise, the ROC curve for the 
LightGBM model is flawless with a perfect AUC of 1.00 
showcasing its flawless power to separate between two 
classes. Both machine learning models show outstanding 
binary classification performance in these situations, 
making them very effective for the MIMIC III dataset. 


Conclusion 

Electronic health records (EHRs) incorporate a vast 
amount of patient information and diagnostic data, most 
of which are considered important health information for 
a person. With the advancement of technology, the 
emergence of advanced cyber threats has escalated, 
hindering health information systems' privacy and 
security. Due to this, privacy and security concerns 


Receiver Operating Characteristic (ROC) Curve - XGBoost 


True Positive Rate 


¢ ——— ROC curve XGBoost (AUC = 1.00) 
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Figure 9. ROC Curve of XGBoost. 


Receiver Operating Characteristic (ROC) Curve - LightGBM 
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Figure 10. ROC Curve of LightGBM. 
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present the largest and most important barrier to adopting 
EHRs. In 
assessments and secured EHRs would be a transformative 


conclusion, incorporating ML-driven 
solution for data protection in the healthcare sector. 
Using machine learning, blockchain and encryption 
algorithms to test and improve the security of EHR 
systems has been shown to work very well, especially 
when the proposed method includes XGBoost and 
LightGBM models. The results obtained showed that the 
XGBoost model had exceptional performance, with a 
nearly perfect ROC curve and an AUC of 1.00, thus 
indicating its high accuracy in classifying positive versus 
negative cases. As well as that, the LightGBM model had 
a flawless performance with a perfect ROC curve. 

Furthermore, in the future, more sophisticated ML 
models, advanced data encryption techniques, and secure 
communication protocols can make this proposed model 
strong enough to withstand emerging threats and increase 
its diagnostic capabilities. 
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